Sentiment =
feelings – Attitudes – Emotions – Opinions
A thought, view, or attitude, especially one based mainly on emotion instead of reason
Subjective impressions, not facts
Sentiment Analysis
Opinion mining
Sentiment mining
Subjectivity analysis
Book: is this review positive or negative?
Products: what do people think about the new iPhone?
Blog:
Politics: what do people think about this candidate or issue?
Twitter:
Movie: is this review positive or negative?
Marketing: how is consumer confidence? Consumer attitudes? Trend?
Prediction: predict election outcomes or market trends from sentiment
Healthcare:
Regular opinions: Sentiment/opinion expressions on some target entities
Direct opinions:
<span style="color:blue>Indirect opinions:
Comparative opinions: Comparison of more than one entity.
We focus on regular opinions first, and just call them opinions.
An opinion is a quintuple
(
entity,
aspect,
sentiment,
holder,
time)
where
entity: target entity (or object).
Aspect: aspect (or feature) of the entity.
Sentiment: +, -, or neu, a rating, or an emotion.
holder: opinion holder.
time: time when the opinion was expressed.
Simplest task:
More complex:
Advanced:
Aspect:
Touch screen
Positive: 212
The
touch screen was really cool.
The
touch screen was so easy to use and can do amazing things.
…
Negative: 6
The
screen is easily scratched.
I have a lot of difficulty in removing finger marks from the
touch screen.
…
Aspect:Aspect:
…
Which features to use?
How to interpret features for sentiment detection?
Harder than topical classification, with which bag of words features perform well
Must consider other features due to…
Using sentiment words and phrases: good, wonderful, awesome, troublesome, cost an arm and leg
Not completely unsupervised!
Home page: http://www.wjh.harvard.edu/~inquirer
List of Categories: http://www.wjh.harvard.edu/~inquirer/homecat.htm
Spreadsheet: http://www.wjh.harvard.edu/~inquirer/inquirerbasic.xls
Free for Research Use
Philip J. Stone, Dexter C Dunphy, Marshall S. Smith, Daniel M. Ogilvie. 1966. The General Inquirer: A Computer Approach to Content Analysis. MIT Press
Home page: http://www.liwc.net/
2300 words, >70 classes
Affective Processes
Cognitive Processes
Pronouns, Negation (no, never), Quantifiers (few, many)
Pennebaker, J.W., Booth, R.J., & Francis, M.E. (2007). Linguistic Inquiry and Word Count: LIWC 2007. Austin, TX
6885 words from 8221 lemmas
Each word annotated for intensity (strong, weak)
GNU GPL
Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP-2005.
Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003.
6786 words
Minqing Hu and Bing Liu. Mining and Summarizing Customer Reviews. ACM SIGKDD-2004.
Home page: http://sentiwordnet.isti.cnr.it/
All WordNet synsets automatically annotated for degrees of positivity, negativity, and neutrality/objectiveness
[estimable(J,3)] “may be computed or estimated”
\[\operatorname{Pos\ \ 0\ \ \ Neg\ \ 0\ \ \ Obj\ \ 1} \]
[estimable(J,1)] “deserving of respect or high regard” \[\operatorname{Pos\ \ .75\ \ \ Neg\ \ 0\ \ \ Obj\ \ .25} \]
Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010 SENTIWORDNET 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. LREC-2010
Christopher Potts, Sentiment Tutorial, 2011
Potts, Christopher. 2011. On the negativity of negation. SALT 20, 636-659.
Potts, Christopher. 2011. On the negativity of negation. SALT 20, 636-659.
Potts, Christopher. 2011. On the negativity of negation. SALT 20, 636-659.
Is logical negation (no, not) associated with negative sentiment?
Potts experiment:
Vasileios Hatzivassiloglou and Kathleen R. McKeown. 1997. Predicting the Semantic Orientation of Adjectives. ACL, 174–181
Potts, Christopher. 2011. On the negativity of negation. SALT 20, 636-659.
Positive phrases co-occur more with “excellent”
Negative phrases co-occur more with “poor”
But how to measure co-occurrence?
\[I(X,Y) = \sum_X \sum_Y{P(x,y)log_2{\frac{P(x,y)}{P(x)P(y)}}}\]
\[PMI(X,Y)=log_2{\frac{P(x,y)}{P(x)P(y)}}\]
\[PMI(X,Y)=log_2{\frac{P(x,y)}{P(x)P(y)}}\]
\[PMI(word_1,woprd_2)=log_2{\frac{P(word_1,word_2)}{P(word_1)P(word_2)}}\]
hits(word)/Nhits(word1 NEAR word2)/N^2\[PMI(word_1,woprd_2)=log_2{\frac{hits(word_1 \: \mathrm{NEAR} \: word_2)}{hits(word_1)hits(word_2)}}\]
\[ \begin{align} \mathrm{Polarity}(phrase) = \mathrm{PMI}(pharse, \mathrm{"excellent"}) - \mathrm{PMI}(pharse, \mathrm{"poor"}) \\ \\ = log_2{\frac{hits(phrase \: \mathrm{NEAR} \: \mathrm{"excellent"})}{hits(phrase)hits(\mathrm{"excellent"})}} - log_2{\frac{hits(phrase \: \mathrm{NEAR} \: \mathrm{"poor"})}{hits(phrase)hits(\mathrm{"poor"})}} \\ \\ = log_2{\frac{hits(phrase \: \mathrm{NEAR} \: \mathrm{"excellent"})}{hits(phrase)hits(\mathrm{"excellent"})}} {\frac{hits(phrase)hits(\mathrm{"poor"})}{hits(phrase \: \mathrm{NEAR} \: \mathrm{"poor"})}} \\ \\ = log_2{(\frac{hits(phrase \: \mathrm{NEAR} \: \mathrm{"excellent"}) hits(\mathrm{"poor"})}{hits(phrase \: \mathrm{NEAR} \: \mathrm{"poor"}) hits(\mathrm{"excellent"})})} \end{align} \]
| Phrase | POS.tags | Polarity |
|---|---|---|
| online service | JJ NN | 2.8 |
| online experience | JJ NN | 2.3 |
| direct deposit | JJ NN | 1.3 |
| local branch | JJ NN | 0.42 |
| … | ||
| low fees | JJ NNS | 0.33 |
| true service | JJ NN | -0.73 |
| other bank | JJ NN | -0.85 |
| inconveniently located | JJ NN | -1.5 |
| Average | 0.32 |
| Phrase | POS.tags | Polarity |
|---|---|---|
| direct deposits | JJ NNS | 5.8 |
| online web | JJ NN | 1.9 |
| very handy | RB JJ | 1.4 |
| … | ||
| virtual monopoly | JJ NN | -2 |
| lesser evil | RBR JJ | -2.3 |
| other problems | JJ NNS | -2.8 |
| low funds | JJ NNS | -6.8 |
| unethical practices | JJ NNS | -8.5 |
| Average | -1.2 |
S.M. Kim and E. Hovy. 2004. Determining the sentiment of opinions. COLING 2004
M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of KDD, 2004
The problem has been studied by numerous researchers.
Key: feature engineering. A large set of features have been tried by researchers. E.g.,
Naïve Bayes (Assume pairwise independent features)
Maximum Entropy Classifier (Assume pairwise independent features)
SVM
Markov Blanket Classifier
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP-2002, 79—86.
Bo Pang and Lillian Lee. 2004. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. ACL, 271-278
Tokenization
Feature Extraction
Classification using different classifiers
Deal with HTML and XML markup
Twitter mark-up (names, hash tags)
Capitalization (preserve forwords in all caps)
Phone numbers, dates
Emoticons
Useful code:
How to handle negation
Which words to use?
Only adjectives
All words
Das, Sanjiv and Mike Chen. 2001. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Proceedings of the Asia Pacific Finance Association Annual Conference (APFA).
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP-2002, 79—86.
Add NOT_ to every word between negation and following punctuation:
\[C_{NB} = \underset{c_j \in C}{\operatorname{argmax}}P(c_j) \prod_{i \in positions}{P(w_i|c_i)} \]
\[\hat{P}(w|c) = \frac{count(w,c) + 1}{count(c) + |V|}\]
Break up data into 10 folds
For each fold
Choose the fold as a temporary test set
Train on 9 folds, compute performance on the test fold
Report average performance of the 10 runs
Negation is important
Using all words (in naïve bayes) works well for some tasks
Finding subsets of words may help in other tasks
Explicit aspects: Aspects explicitly mentioned as nouns or noun phrases in a sentence
Implicit aspects: Aspects not explicitly mentioned in a sentence but are implied
Some work has been done (Su et al. 2009; Hai et al 2011)
“ Trying out Chrome because Firefox keeps crashing.”
Firefox - negative; no opinion about chrome.
We need to segment the sentence into clauses to decide that “crashing” only applies to Firefox(?).
But how about these
“ I changed to Audi because BMW is so expensive.”
“ I did not buy BWM because of the high price.”
“ I am so happy that my iPhone is nothing like my old ugly Droid.”
These two sentences are from paint reviews.
“ For paintX, one coat can cover the wood color.”
“ For paintY, we need three coats to cover the wood color
We know that paintX is good and paintY is not, but how, by a system.
“My goal is to get a tv with good picture quality”
“The top of the picture was brighter than the bottom.”
“When I first got the airbed a couple of weeks ago it was wonderful as all new things are, however as the weeks progressed I liked it less and less.”
Conditional sentences are hard to deal with (Narayanan et al. 2009)
“ If I can find a good camera, I will buy it.”
But conditional sentences can have opinions
Questions are also hard to handle
“ Are there any great perks for employees?”
“ Any idea how to fix this lousy Sony camera?”
Sarcastic sentences
Sarcastic sentences are common in political blogs, comments and discussions.
Sentiment: Positive, Negative, Neutral
Emotion: angry, sad, joyful, fearful, ashamed, proud, elated
Disease: Healthy, Cold, Flu
Weather: Sunny, Cloudy, Rain, Snow
While some classification algorithms naturally permit the use of more than two classes and/or labels, others are by nature binary algorithms; these can, however, be turned into multinomial classifiers by a variety of strategies.
A common strategy is one-vs-all, which involves training a single classifier per class, with the samples of that class as positive samples and all other samples as negatives.
Train a logistic regression classifier \(h_\theta^{(i)}(x)\) for each class \(i\) to predict the probability that \(y=i\)
Given a new input \(x\), pick the class \(i\) that maximizes
\[\max_i{h_\theta^{(i)}(x)}\]
Ex:
Naïve Bayes
Estimate \(P(Y)\) and \(P(X|Y)\)
Prediction
\[\hat{y} = \underset{y}{\operatorname{argmax}}P(Y = y)P(X = x|Y = y)\]
Ex:
Logistic regression
Estimate \(P(Y|X)\) directly
(Or a discriminant function: e.g., SVM)
Prediction
\[\hat{y} = P(Y = y|X = x)\]
In multiclass, one-vs-all requires the base classifiers to produce a real-valued score for its decision, rather than just a class label. Then, the final label is the one corresponding to the class with the highest score.
In multilabel, this strategy predicts all labels for this sample for which the respective classifiers predict a positive result.
Sentiment Analysis
Multiclass and Multi-label classification